Skip to content

perf: increase the cache hit rate of group context awareness#8226

Merged
Soulter merged 17 commits into
AstrBotDevs:masterfrom
RC-CHN:refactor-ltm
May 30, 2026
Merged

perf: increase the cache hit rate of group context awareness#8226
Soulter merged 17 commits into
AstrBotDevs:masterfrom
RC-CHN:refactor-ltm

Conversation

@RC-CHN

@RC-CHN RC-CHN commented May 18, 2026

Copy link
Copy Markdown
Member

reopen of #8144 , purify commit history

  • Add raw_records / contexts / summaries data model per group
  • Add LLM summary compaction strategy alongside truncation
  • Add turn-based (_split_into_rounds) granularity
  • Add image caption integration into LTM history
  • Add tool_call / tool_result persistence into raw_records
  • Add active reply support driven by LTM state
  • Improve summary injection prefix with system note and delimiters
  • Add info-level logging for summary compaction lifecycle
  • Clarify default summary prompt with explicit preserve/drop rules
  • Add context_guard for history overflow protection in agent runner
  • Add internal agent history compaction in agent_sub_stages
  • Add comprehensive LTM unit tests and compaction test suites

Modifications / 改动点

Core: astrbot/builtin_stars/astrbot/long_term_memory.py

  • Replace max_cnt ring buffer with raw_records (deque) + _raw_cursor + contexts (append-only list). Old segments are never rebuilt.
  • _build_segments() converts raw chat lines into OpenAI-format context segments, handling tool calls, parallel tools, and multi-step chains.
  • <BOT/> markers replace [You/] to avoid nickname collisions.
  • on_agent_done records tool-call chains and now includes the @bot prompt in contexts so future rounds see the user's original message.
  • asyncio.Lock for concurrency safety; remove_session() for cleanup.

Hook wiring: astrbot/builtin_stars/astrbot/main.py

  • Swap @on_llm_response@on_agent_done for accurate tool-chain recording.
  • Lazy toggle detection: false→true cleans stale state on next message.
  • group_icl_enable=true skips Conversation DB query (conversation=None).

Config: astrbot/builtin_stars/astrbot/default.py

  • Default context_limit_reached_strategy"llm_compress".

Agent runner: astrbot/core/astr_main_agent.py

  • _get_compress_provider auto-falls back to the main chat provider when llm_compress_provider_id is unset, preventing silent truncation.

Tests: tests/unit/test_long_term_memory.py

  • Pure functions: extract, parse, truncate, build_segments.

  • Integration: round-trip lifecycle, multi-round accumulation, tool chains, persona preservation, concurrent safety.

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

@auto-assign auto-assign Bot requested review from advent259141 and anka-afk May 18, 2026 07:20
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label May 18, 2026

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @RC-CHN, your pull request is larger than the review limit of 150000 diff characters

@dosubot dosubot Bot added area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 18, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant upgrade to the Long-Term Memory (LTM v2) system, implementing a more robust architecture for group chat context management. Key changes include the introduction of a RequestContextGuard to protect provider requests from token limits without mutating persistent history, and the implementation of dual compaction strategies (turn-based truncation and LLM-based summarization) for both group and private chats. Feedback identifies several critical issues: potential blocking of message recording due to holding locks across network calls, memory leaks from uncleaned session locks, and resource leaks from uncancelled tasks in the tool loop. Additionally, improvements were suggested for handling malformed JSON in tool arguments, preventing system prompt duplication, and ensuring consistent fallback logic for compression providers.

Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py Outdated
Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py Outdated
Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py Outdated
Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py Outdated
Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py Outdated
Comment thread astrbot/core/astr_main_agent.py
Comment thread astrbot/core/agent/runners/tool_loop_agent_runner.py

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements LongTermMemory v2, introducing sophisticated context management for group chats, including logical round splitting and dual compaction strategies (truncation and LLM-based summarization). It also adds a RequestContextGuard to ensure provider requests stay within token limits without mutating canonical history. Feedback identifies several areas for refinement: the raw_records buffer needs trimming during message handling to prevent memory exhaustion, and robust error handling should be added to tool call parsing. Furthermore, the reviewer pointed out a memory leak in the session lock dictionary, suggested optimizing string length calculations for memory checks, and noted that the compression provider fallback logic needs to be correctly implemented to match the intended design.

Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py Outdated
Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py Outdated
Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py Outdated
Comment thread astrbot/builtin_stars/astrbot/long_term_memory.py Outdated
Comment thread astrbot/core/astr_main_agent.py
RC-CHN added 9 commits May 19, 2026 09:40
- Add raw_records / contexts / summaries data model per group
- Add LLM summary compaction strategy alongside truncation
- Add turn-based (_split_into_rounds) granularity
- Add image caption integration into LTM history
- Add tool_call / tool_result persistence into raw_records
- Add active reply support driven by LTM state
- Improve summary injection prefix with system note and delimiters
- Add info-level logging for summary compaction lifecycle
- Clarify default summary prompt with explicit preserve/drop rules
- Add context_guard for history overflow protection in agent runner
- Add internal agent history compaction in agent_sub_stages
- Add comprehensive LTM unit tests and compaction test suites
- Treat lines starting with <T:CALL>, <T:RES, or <BOT/ as regular user
  messages when their respective parsers return None, instead of silently
  dropping them. Defensive guard against malformed internal markers.
Avoid allocating a new bytes object for every string when calculating
buffer size in _trim_raw_records. Character count is sufficient for
the approximate memory cap.
# Conflicts:
#	astrbot/builtin_stars/astrbot/main.py
@w31r4

w31r4 commented May 23, 2026

Copy link
Copy Markdown
Contributor

fix conflict

@w31r4 w31r4 self-assigned this May 23, 2026
@w31r4 w31r4 self-requested a review May 23, 2026 16:31

@w31r4 w31r4 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 23, 2026
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels May 30, 2026
@Soulter Soulter merged commit 95d8057 into AstrBotDevs:master May 30, 2026
21 checks passed
@Soulter Soulter changed the title refactor(ltm): redesign long-term memory with context compaction (reopen of #8144) perf: increase the cache hit rate of group context awareness May 30, 2026
@RC-CHN RC-CHN deleted the refactor-ltm branch May 30, 2026 11:04
fallback_providers = _get_fallback_chat_providers(
provider, plugin_context, config.provider_settings
)
selected_provider = _select_image_chat_provider(provider, req, fallback_providers)

@Dt8333 Dt8333 Jun 2, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#8498
这里为什么需要删掉这个参数enforce_max_turns=config.max_context_length

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?怎么标到这里了

Foolllll-J added a commit to Foolllll-J/AstrBot that referenced this pull request Jun 2, 2026
…s for llm compress, fix AftCompact debug log

Three context-compaction regression fixes after AstrBotDevs#8226:

1. Restore max_context_length -> enforce_max_turns propagation so
   normal turn-based truncation works again.
2. Serialize ContentPart and ToolCall objects into plain dicts in
   _message_to_dict so llm_compress no longer fails with JSON
   serialization errors.
3. Print _provider_messages (compacted) instead of run_context.messages
   (unchanged) in AftCompact debug log; truncate long role lists to
   first4,...,last4 to avoid log spam.

Assertions in tests are also hardened to avoid coupling to exact prompt
wording.
Soulter added a commit that referenced this pull request Jun 3, 2026
Soulter added a commit that referenced this pull request Jun 3, 2026
…context compression, handle compression model modalities (#8530)

* fix(context): restore turn cap, serialize content parts and tool calls for llm compress, fix AftCompact debug log

Three context-compaction regression fixes after #8226:

1. Restore max_context_length -> enforce_max_turns propagation so
   normal turn-based truncation works again.
2. Serialize ContentPart and ToolCall objects into plain dicts in
   _message_to_dict so llm_compress no longer fails with JSON
   serialization errors.
3. Print _provider_messages (compacted) instead of run_context.messages
   (unchanged) in AftCompact debug log; truncate long role lists to
   first4,...,last4 to avoid log spam.

Assertions in tests are also hardened to avoid coupling to exact prompt
wording.

* fix(tool_loop_agent_runner): simplify context handling by removing redundant provider messages

* fix(tool_loop_agent_runner): rename context manager variables for clarity

* fix: update context compression to use recent token ratio instead of fixed count

* fix: enhance LLMSummaryCompressor to sanitize contexts and improve message handling

* ruff format

---------

Co-authored-by: Soulter <905617992@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants